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Abstract 

We address the problem of constructing a fast lossless code in the 
case when the source alphabet is large. The main idea of the new 
scheme may be described as follows. We group letters with small 
probabilities in subsets (acting as super letters) and use time consum- 
ing coding for these subsets only, whereas letters in the subsets have 
the same code length and therefore can be coded fast. The described 
scheme can be applied to sources with known and unknown statistics. 

Keywords, fast algorithms, source coding, adaptive algorithm, cumu- 
lative probabilities, arithmetic coding, data compression, grouped alphabet. 

1 Introduction. 

The computational efficiency of lossless data compression for large alphabets 
has attracted attention of researches for ages due to its great importance in 
practice. The alphabet of 2 8 = 256 symbols, which is commonly used in 
compressing computer files, may already be treated as a large one, and with 
adoption of the UNICODE the alphabet size will grow up to 2 16 = 65536. 
Moreover, there are many data compression methods when the coding is 
carried out in such a way that, first input data are transformed by some 
algorithm, and then the resulting sequence is compressed by a lossless code. 
It turns out that very often the alphabet of the sequence is very large or 
even infinite. For instance, the run length code, many implementations of 
Lempel- Ziv codes, Grammar - Based codes jH IHj and many methods of im- 
age compression can be described in this way. That is why the problem of 
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constructing high-speed codes for large alphabets has attracted great atten- 
tion by researches. Important results have been obtained by Moffat, Turpin 

EllEllZlliniEEniE] and others LH [21 US] • 

For many adaptive lossless codes the speed of coding depends substan- 
tially on the alphabet size, because of the need to maintain cumulative prob- 
abilities. The speed of an obvious (or naive) method of updating the cumula- 
tive probabilities is proportional to the alphabet size N. Jones [3j and Ryabko 
[T2*] have independently suggested two different algorithms, which perform 
all the necessary transitions between individual and cumulative probabilities 
in O (log TV) operations under (log iV + r)- bit words , where r is a constant 
depending on the redundancy required, N is the alphabet size. Later many 
such algorithms have been developed and investigated in numerous papers 

ecu 121 mm . 

In this paper we suggest a method for speeding up codes based on the 
following main idea. Letters of the alphabet are put in order according 
to their probabilities (or frequencies of occurrence), and the letters with 
probabilities close to each others are grouped in subsets (as new super letters), 
which contain letters with small probabilities. The key point is the following: 
equal probability is ascribed to all letters in one subset, and, consequently, 
their codewords have the same length. This gives a possibility to encode and 
decode them much faster than if they are different. Since each subset of the 
grouped letters is treated as one letter in the new alphabet, whose size is 
much smaller than the original alphabet. Such a grouping can increase the 
redundancy of the code. It turns out, however, that a large decrease in the 
alphabet size may cause a relatively small increase in the redundancy. More 
exactly, we suggest a method of grouping for which the number of the groups 
as a function of the redundancy (5) increases as c(log N + 1/5) + ci, where 
N is the alphabet size and c, C\ are constants. 

In order to explain the main idea we consider the following example. Let a 
source generate letters {cto, . . . , a^} with probabilities p(a Q ) = 1/16, p{a\) = 
1/16, p{a 2 ) = 1/8, p(as) = 1/4, p{a^) = 1/2, correspondingly. It is easy to 
see that the following code 

code(a ) = 0000, code(ai) = 0001, code(a 2 ) = 001, code(a 3 ) = 01, code(a^) = 1 

has the minimal average codeword length. It seems that for decoding one 
needs to look at one bit for decoding a 4 , two bits for decoding a 3 , 3 bits for 
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a 2 and 4 bits for a x and a . However, consider another code 

code{a$) = l,code(a ) = 000, code(ai) = 001, code(a 2 ) = 010, code(a 3 ) = Oil, 

and we see that, on the one hand, its average codeword length is a little larger 
than in the first code (2 bits instead of 1.825 bits), but, on the other hand, 
the decoding is simpler. In fact, the decoding can be carried out as follows. 
If the first bit is 1, the letter is 04. Otherwise, read the next two bits and 
treat them as an integer (in a binary system) denoting the code of the letter 
(i.e. 00 corresponds ao, 01 corresponds a 1; etc.) This simple observation can 
be generalized and extended for constructing a new coding scheme with the 
property that the larger the alphabet size is, the more speeding-up we get. 

In principle, the proposed method can be applied to the Huffman code, 
arithmetic code, and other lossless codes for speeding them up, but for the 
sake of simplicity, we will consider the arithmetic code in the main part of 
the paper, whereas the Huffman code and some others will be mentioned 
only briefly, because, on the one hand, the arithmetic code is widely used in 
practice and, on the other hand, generalizations are obvious. 

The suggested scheme can be applied to sources with unknown statistics. 
As we mentioned above, the alphabet letters should be ordered according to 
their frequency of occurrences when the encoding and decoding are carried 
out. Since the frequencies are changing after coding of each message letter, 
the order should be updated, and the time of such updating should be taken 
into account when we estimate the speed of the coding. It turns out that 
there exists an algorithm and data structure, which give a possibility to carry 
out the updating with few operations per message letter, and the amount of 
these operations does not depend on the alphabet size and/or a probability 
distribution. 

The rest of the paper is organized as follows. The second part contains 
estimations of the redundancy caused by the grouping of letters, and it con- 
tains examples for several values of the redundancy. A fast method of the 
adaptive arithmetic code for the grouped alphabet as well as the data struc- 
ture and algorithm for easy maintaining the alphabet ordered according to 
the frequency of the occurrences are given in the third and the fourth parts. 
Appendix contains all the proofs. 
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2 The redundancy due to grouping. 



First we give some definitions. Let A = {ai, GS2, ■ ■ ■ , Ojv} be an alphabet 
with a probability distribution p = {px,Pz, ■ ■ ■ ,Pn} where pi > P2 > • • • > 
Pn,N > 1. The distribution can be either known a priori or it can be 
estimated from the occurrence counts. In the last case the order of the 
probabilities should be updated after encoding each letter, and it should be 
taken into account when the speed of coding is estimated. The simple data 
structure and algorithm for maintaining the order of the probabilities will 
be described in the fourth part, whereas here we discuss estimation of the 
redundancy. 

Let the letters from the alphabet A be grouped as follows : A x = {a 1; a 2 , 

•••,O ni }, A2 = {a ni _|_i, a ni -f2, ■ ■ ■ , O n2 }, . . . , A s = {On s _i+l) a n s _i+2> ■ ■ ■ ; a n s J 

where n s = N,s > 1. We define the probability distribution 7r and the 
vector m = (m 1; m 2 , m s ) by 

and rrii = (n« — nj_i),n = 0,i = 1,2, ...,s, correspondingly. In fact, the 
grouping is defined by the vector m. We intend to encode all letters from 
one subset A4 by the codewords of equal length. For this purpose we ascribe 
equal probabilities to the letters from Ai by 

Pj = TCi/rrii (2) 

if dj G Ai, i = 1, 2, . . . , s. Such encoding causes redundancy, defined by 

N 

r(p, fh) = J2Pi log(PiM)- ( 3 ) 
i=i 

(Here and below log( ) = log 2 ( ).) 

The suggested method of grouping is based on information about the 
order of probabilities (or their estimations). We are interested in an upper 
bound for the redundancy (jSJ) defined by 

R(fn) = sup r(p,m); P N = {pi,p 2 , ■ ■ ■ ,Pn} ■ Pi > P2 > ■ ■ ■ > Pn}- (4) 

P&Pn 

The following theorem gives the redundancy estimate. 
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Theorem 1. 

The following equality for the redundancy O is valid. 

R{m) = max max Zlog(mj/7)/(nj + Z), (5) 
i=l,...,s (=l,.,.,mj 

where, as before, m = (mi,m 2 , m s ), ni = J2)=i m ji £ = 1, s. 
T/ie proof is given in Appendix. 

The practically interesting question is how to find a grouping which min- 
imizes the number of groups for a given upper bound of the redundancy 5. 
Theorem 1 can be used as the basis for such an algorithm. This algorithm 
is implemented as a Java program and has been used for preparation of all 
examples given below. The program can be found on the internet and used 
for practical needs, see 

http: //www.ict.nsc.ru~ryabko/ GroupYourAlphabet.html 
Let us consider some examples of such grouping carried out by the pro- 
gram mentioned. 

First we consider the Huffman code. It should be noted that in the case 
of the Huffman code the size of each group should be a power of 2, whereas it 
can be any integer in case of an arithmetic code. This is because the length 
of Huffman codewords must be integers whereas this limitation is absent in 
arithmetic code. 

For example, let the alphabet have 256 letters and let the additional 
redundancy (j2J) not exceed 0.08 per letter. (The choice of these parameters 
is appropriate, because an alphabet of 2 8 = 256 symbols is commonly used 
in compressing computer files, and the redundancy 0.08 a letter gives 0.01 
a bit.) In this case the following grouping gives the minimal number of the 
groups s. 

A x = {aj, A 2 = {a 2 }, . . . , A 12 = {a 12 }, 
A13 = {a,i3, Ou}, A14 = {ai5, aig}, . . . , Ai 9 = {a 25 , o 2 q}, 

A 2 o = {ct27) ^28) a 29, O30}, • • • , A 2 q = {051, 052, «53, 054}, 

A 27 = {055, a 56 , . . . , oq 2 }, . . . , A 32 = {a 95 , . . . , a 102}, 

^33 = { a 103, «104, • • • , 0118/) ■ ■ ■ , ^39 = {^199, ■ ■ ■ j «214}, 
A40 = {0215> a 216; • • • 5 «246}; All = { a 247; • • • > a 278/- 

We see that each of the first 12 subsets contains one letter, each of the 
subsets Al3, • • • j A9 contains two letters, etc., and the total number of the 
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subsets s is 41. In reality we could let the last subset An contain the letters 
{a 2 47, • • • , a 2 7s} rather than the letters {a 2 47, . . . , a 256 }, since each letter from 
this subset will be encoded inside the subset by 5- bit words (because log 32 = 
5). 

Let us proceed with this example in order to show how such a grouping 
can be used to simplify the encoding and decoding of the Huffman code. If 
someone knows the letter probabilities, he can calculate the probability distri- 
bution 7T by Q and the Huffman code for the new alphabet A — Ai, . . . , An 
with the distribution rr. If we denote a codeword of Ai by code(A) and 
enumerate all letters in each subset A from to |Aj| — 1, then the code of a 
letter aj G A can be presented as the pair of the words 

code(Ai) {number of cij G A}, 

where {number of aj G A} is the log|A|- bit notations of the aj number 
(inside A)- F° r instance, the letter a W3 is the first in the 16- letter subset 
A33 and a 2 46 is the last in the 32- letter subset Alo. They will be encoded by 
code (A33) 0000 and code(Aio) 11111, correspondingly. It is worth noting that 
the code(A) , i = l,--.,s, depends on the probability distribution whereas 
the second part of the codewords {number of aj G A} does not do that. So, 
in fact, the Huffman code should be constructed for the 41- letter alphabet 
instead of the 256- one, whereas the encoding and decoding inside the subsets 
may be implemented with few operations. Of course, this scheme can be 
applied to a Shannon code, alphabetical code, arithmetic code and many 
others. It is also important that the decrease of the alphabet size is larger 
when the alphabet size is large. 

Let us consider one more example of grouping, where the subset sizes 
don't need to be powers of two. Let, as before, the alphabet have 256 letters 
and let the additional redundancy (J2J) not to exceed 0.08 per letter. In this 
case the optimal grouping is as follows. 



|Ai| = IA2I = 


IA2I = 1,| 


As 




|Ae| =2, 


\ A n\ = 


1^18 1 


A9 = IA20I 


= 4, IA21I = 


= 5, 


A22I = 6, IA23I = 


7, |A 24 | = 


8, |A 25 | 


= 9, 


1^26 1 = 


11, |A 27 | = 


12, 


|A 28 | = 14, |A 29 | 


= 16, |A 30 | 


= 19, 




|43l| = 


22, IA32I = 


25, 


IA33I = 29, IA34I 


= 34,L4 35 | 


= 39. 





We see that the total number of the subsets (or the size of the new alpha- 
bet) is less than in the previous example (35 instead of 41), because in the 
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first example the subset sizes should be powers of two, whereas there is no 
such limitation in the second case. So, if someone can accept the additional 
redundancy 0.01 per bit, he can use the new alphabet A = {Ai, . . . ,A 35 } 
instead of 256- letter alphabet and implement the arithmetic coding in the 
same manner as it was described for the Huffman code. (The exact descrip- 
tion of the method will be given in the next part). We will not consider the 
new examples in details, but note again that the decrease in the number of 
the letters is more, when the alphabet size is larger. Thus, if the alphabet 
size is 2 16 and the redundancy upper bound is 0.16 (0.01 per bit), the number 
of groups s is 39, and if the size is 2 20 then s = 40 whereas the redundancy 
per bit is the same. (Such calculations can be easily carried out by the above 
mentioned program). 

The required grouping for decreasing the alphabet size is based on the 
simple theorem 2, for which we need to give some definitions standard in 
source coding. 

Let 7 be a certain method of source coding which can be applied to letters 
from a certain alphabet A. If p is a probability distribution on A, then the 
redundancy of 7 and its upper bound are defined by 

p(l,P) = P( a )(\^( a )\ + lo gP( a ))> Kl) = su P P p(l,p), (6) 

where the supremum is taken over all distributions p, (7(a) | and p(a) are the 
length of the code word and the probability of a G A, correspondingly. For 
example, p equals 1 for the Huffman and the Shannon codes whereas for the 
arithmetic code p can be done as small as it is required by choosing some 
parameters, see, for ex., [T4] . 

The following theorem gives a formal justification for applying the above 
described grouping for source coding. 

Theorem 2. Let the redundancy of a certain code 7 be not more than 
some A for all probability distributions. Then, if the alphabet is divided into 
subsets Ai,i = 1, ...,s, in such a way that the additional redundancy J3J) 
equals 5, and the code 7 is applied to the probability distribution p defined by 
(HJ), then the total redundancy of this new code is upper bounded by A + 5. 

Theorem 1 gives a simple algorithm for finding the grouping which gives 
the minimal number of the groups s when the upper bound for the admissible 
redundancy is given. On the other hand, the simple asymptotic estimate 
of the number of such groups and the group sizes can be interesting when 
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the number of the alphabet letters is large. The following theorem can be 
used for this purpose. 
Theorem 3. 

Let 8 > be an admissible redundancy of a grouping. 

i) If 

mi < L^i-i e/(loge- 8 e) J, (7) 

then the redundancy of the grouping (m^mj, . . .) does not exceed 8, where 
n % = E}=i m,-, e « 2.718....;. 

ii) the minimal number of groups s as a function of the redundancy 8 is 
upper bounded by 

clogN/8 + Cx, (8) 

where c and c\ are constants and N is the alphabet size, N — > oo. 
The proof is given in Appendix. 

Comment 1. The first statement of the theorem 3 gives construction of 
the 8— redundant grouping (mi, 777,2, ••■) for an infinite alphabet, because mi 
in depends only on previous m 1 , m 2 , . . . , m^-i. 

Comment 2. Theorem 3 is valid for grouping where the subset sizes 
(mi, 777.2, • • •) should be powers of 2. 

3 The arithmetic code for grouped alphabets. 

Arithmetic coding was introduced by Rissanen JT] in 1976 and now it is one 
of the most popular methods of source coding, see, e.g., [S], [Ej. The ad- 
vantage of arithmetic coding over other coding techniques is that it achieves 
arbitrarily small coding redundancy per source symbol at less computational 
effort than any other method. 

We give first a brief description of an arithmetic code by paying atten- 
tion to features which determine the speed of encoding and decoding. As 
before, consider a memoryless source generating letters from the alphabet 
A = {ai, ajv} with unknown probabilities. Let the source generate a mes- 
sage x\ . . .Xt-iXt . . ., Xi G A for all i, and let v t {a) denote the occurrence 
count of letter a in the word x\ . . . x t -ix t - After first t letters xi, . . . , x t -i, x t 
have been processed the following letter x t +\ needs to be encoded. In the 
most popular version of the arithmetic code the current estimated probability 
distribution is taken as 

p*(a) = 0*(a) +c)/(t + Nc),a G A, (9) 
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where c is a constant (as a rule c is 1 or 1/2). Let x t+ \ = d{, and let the inter- 
val [a, (3) represent the word X\ . . . x t -\X t . Then the word x\ . . . x t -ix t x t+ i, 
x t +i = di will be encoded by the interval 



[a+(P- a) ql a + ((3 -a) q\ +l ) 



(10) 



where 



i-l 



4 = X 



(11) 



When the size of the alphabet N is large, the calculation of q\ is the most 
time consuming part in the encoding process. As it was mentioned in the 
introduction, there are fast algorithms for calculation of q\ in 



operations under (log N + r)- bit words, where r is the constant determining 
the redundancy of the arithmetic code. (As a rule, this length is in propor- 
tional to the length of the computer word: 16 bits, 32 bits, etc.) 

We describe a new algorithm for the alphabet whose letters are divided 
into subsets A\, . . . , Al, and the same probability is ascribed to all letters 
in the subset. Such a separation of the alphabet A can depend on t which 
is why the notation A\ is used. But, on the other hand, the number of the 
letters in each subset A\ will not depend on t which is why it is denoted as 



In principle, the scheme for the arithmetic coding is the same as in the 
above considered case of the Huffman code: the codeword of the letter x t +i = 
di consists of two parts, where the first part encodes the set A\ that contains 
di, and the second part encodes the ordinal of the element di in the set A\. 
It turns out that it is easy to encode and decode letters in the sets A\, and 
the time consuming operations should be used to encode the sets A\, only. 

We proceed with the formal description of the algorithm. Since the prob- 
abilities of the letters in A can depend on t we define in analogy with (|T|).(|2*j) 



T = c\ log N + c 2 , 



(12) 



\Al\= mi . 



A = X) ft-. Pi = nt i/ m i 



(13) 



and let 



(14) 
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The arithmetic encoding and decoding are implemented for the proba- 
bility distribution ()13j). where the probability p\ is ascribed to all letters 
from the subset Aj. More precisely, assume that the letters in each A\ are 
enumerated from 1 to wij, and that the encoder and the decoder know this 
enumeration. Let, as before, x t +i = a«, and let ai belong to A\ for some 
k. Then the coding interval for the word X\ . . . x t -ix t x t +i is calculated as 
follows 

[a+(/3- a){Ql + (6(04) -I) Pi), a +((3- a)(Q* k + 6(0*) pf) ), (15) 

where 5(cij) is the ordinal of a, in the subset A\. It can be easily seen that 
this definition is equivalent with (jl(J|) . where the probability of each letter 
from Ai equals p\. Indeed, let us order the letters of A according to their 
count of occurrence in the word x\ . . . Xt-iXt, and let the letters in A\^ k = 
l,2,...,s, be ordered according to the enumeration mentioned above. We 
then immediately obtain (|15|) from (jl(Jj) and (|13|) . The additional redundancy 
which is caused by the replacement of the distribution © by p\ can be 
estimated using Q and the theorems 1-3, which is why we may concentrate 
our attention on the encoding and decoding speed and the storage space 
needed. 

First we compare the time needed for the calculation in (J10)) and (|15|). If 
we ignore the expressions (<5(aj) — l)p- and 5(ai)p- for a while, we see that 
(fTo^l can be considered as the arithmetic encoding of the new alphabet {A\, 
A\, Al}. Therefore, the number of operations for encoding by (JT5j) is the 
same as the time of arithmetic coding for the s letter alphabet, which by 
(f*HZJ) equals Cilogs + c 2 . The expressions (5(<2j) — l)pj and <5(aj)p/ require 
two multiplications, and two additions are needed to obtain bounds of the 
interval in (J15|) . Hence, the number of operations for encoding (T) by (|15|) 
is given by 

T = cl\ogs + c* 2 , (16) 

where cl, (f 2 are constants and all operations are carried out under the word 
of the length (log iV + r)- bit as it was required for the usual arithmetic code. 
In case s is much less than N, the time of encoding in the new method is 
less than the time of the usual arithmetic code, see (fTB^) and (fT2j) . 

We describe shortly decoding with the new method. Suppose that the 
letters x\ . . . x t -ix t have been decoded and the letter x t +\ is to be decoded. 
There are two steps required: first, the algorithm finds the set A\ with the 
usual arithmetic code that contains the (unknown) letter aj. The ordinal of 
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the letter Oj is calculated as follows: 

50 = [(code(x t+1 ...)-Q})/pn, (17) 

where code{x t+ \...) is the number that encodes the word x t+ iX i+2 .... It can 
be seen that (fTTjl is the inverse of (fTH|) . In order to calculate (|T7j) the de- 
coder should carry out one division and one subtraction. That is why the 
total number of decoding operations is given by the same formula as for the 
encoding, see (fTBj) . 

It is worth noting that multiplications and divisions in (|15|) and (|17j) 
could be carried out faster if the subset sizes are powers of two. But, on the 
other hand, in this case the number of the subsets is larger, that is why both 
version could be useful. 

We did not estimate yet the time needed for maintaining the order of 
letters from A according to their frequencies Q. The point is that the 
order should be updated by the encoder and the decoder after encoding and 
decoding each letter x t . It turns out that it is possible to update the order 
using a fixed number of operations. Such a method is described in the next 
section. Besides, we should take into account that, when x t is encoded (or 
decoded), one frequency should be changed and at most two 7Tj (fT3|) must 
be recalculated. It is easy to see that all these transformations can be done 
with no more than two additions and two subtractions. Therefore, the total 
number of operations for encoding and decoding is given by ()16|) with the 
new constant c* 2 . 

So we can see that if the arithmetic code can be applied to an N — letter 
source, so that the number of operations (under words of a certain length) 
of coding is 

T = ci log iV + c 2 , 

then there exists an algorithm of coding, which can be applied to the grouped 
alphabet A\, . . . , A\ in such a way that, first, at each moment t the letters 
are ordered by decreasing frequencies and, second, the number of coding 
operations is 

T — C\ log s + c* 2 
with words of the same length, where c±, c 2 , c 2 are constants. 
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4 A fast algorithm for keeping the alphabet 
letters ordered. 



In this section we describe a data structure and an algorithm, which allow one 
to carry out all the operations for maintaining the alphabet letters ordered 
by their frequencies, in such a way that the number of such operations is con- 
stant, independently of the probability distribution, the size of the alphabet, 
and other characteristics. 

The data structure suggested is based on five arrays Fr[l : N], Sorted 
Alphabet^ : N], InverseSort[l : N], SetBegin[0 : MAX], SetEnd[0 : MAX], 
where, as before, N is the size of the alphabet, A| is the set of the letters 
from A, which frequency of the occurrence equals k at the moment t and 
MAX is an upper bound for the maximal count of occurrence (For ex- 
ample, if the code uses the sliding window to adapt to the source, MAX 
is upper bounded by the length of the window). At each moment t the 
array Fr contains information about frequencies of occurrence of the let- 
ters from A in the word x\ . . .x t -ix t such that Fr[i] = v l (ai). The array 
Sorted Alphabet[l : N] consists of letters from A ordered by the frequency of 
occurrence. More precisely, the following property is satisfied: if % < j and 
Sorted Alphabet[i] = b and Sorted Alphabet[j] = c, then z/(5) < z/(c). In par- 
ticular, it means that all letters from a subset K\, k — 0, 1, are situated in 
succession in Sorted Alphabet]). : N] and forming a string. SetBegin[k] and 
SetEnd[k] contain information about the beginning and the end of such a 
string. At last, by definition, InverseSort[i] contains an integer j such that 
Sorted Alphabet]]] = Oj. 

Let us consider a small example. Let N = A, t = A and the frequen- 
cies f*(ai) = 0,u t (a 2 ) = l,v t (a 3 ) = 2 and f*(a 4 ) = I. Then, Fr = 
[0,1,2,1], Sorted Alphabet = | a\, (24, 02) 

03], InverseSort = [1,3, A, 2], Set 
Begin = [1,2,4], SetEnd = [1,3,4] is one possible configuration of the con- 
tents of the relevant arrays. 

Consider next updating the information in the arrays, which should be 
done by the encoder (and decoder) after encoding (and decoding) of each let- 
ter, in such a way that only a constant number of operations is needed. Sup- 
pose we encode the letter 04 and increment its occurrence count. The arrays 
should be changed as follows : the processed letter (a^) should be exchanged 
with the last letter from A^ (A^ in our case) and the relevant modifications 
should be done in Sorted Alphabet and InverseSort. Then the letter pro- 
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cessed should be included in the set A^ +1 and excluded from the set Ajj.. In 
fact, it is enough to change two elements in Set Begin and Set End, namely, 
SetBegin[k + 1] = SetBegin[k + 1] — 1 and SetEnd[k] = SetEnd[k] — 1. (In 
our example, should be moved from A^ into A|. When we carry out these 
calculations the result is Fr = [0,1,2,2], Sorted Alphabet = [ai, a 2 , a 4 , a 3 ], 
InverseSort = [1, 2, 4, 3], SetBegin = [1, 2, 3] and SetEnd = [1, 2, 4].) 

We have considered the case when the occurrence count should be in- 
cremented. Decrementing, which is used in certain schemes of the adaptive 
arithmetic code, can be carried out in a similar manner. 



5 Appendix. 

The proof of Theorem 1. It is easy to see that the set P/v of all distri- 
butions which are ordered according to the probability decreasing is convex. 
Indeed, each p = {pi,p2, ■ ■ ■ ,Pn} G P~n m &y be presented as a linear combi- 
nation of vectors from the set 

Q N = {qi = (l,0,...,0),g 2 = (1/2, 1/2,0,..., 0),...,q N = (1/N, . . . ,1/N) 

(18) 

as follows: 

N 
i=l 

where pn+i = 0. 

On the other hand, the redundancy (JHJ) is a convex function, because the 
direct calculation shows that its second partial derivatives are nonnegative. 
Indeed, the redundancy (j3J) can be represented as follows. 

N s 
i=l j=l 
N s 

J2 Pi log(p«) - J2 *i ( lo S *i - lo S m i) + 

i=2 j=2 
N N s s 

(l-E^Mi-E^) ~ (l-^7Tz)(log(l-^7T Z )-logmi). 

k=2 k=2 1=2 1=2 
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If dj is a certain letter from A and j is such a subset that dj G Aj then, the 
direct calculation shows that 

AT s 

dr/dpi = log 2 e ( hip; — In 7ij — ln(l — ^ + ln(l — 7T;) ) + constant, 

k=2 1=2 

d 2 r/d 2 Pi = log 2 e ((-1/tt; + 1/ Pi ) + (-1/tti + l/p x )). 

The last value is nonnegative, because, by definition, 7Tj = J2k=n- 1 Pfc and pj 
is one of the summands as well as pi is one of the summands of Ti\ . 

Thus, the redundancy is a convex function defined on a convex set, and 
its extreme points are Qn from (fTSj) . So 

supp € p N r(p,m) = max r(g,m). 

<? e Qjv 

Each g G Qn can be presented as a vector g = (l/(^j + /),..., l/(nj + 
Z) , 0, . . . , 0) where 1 < I < nii + i, i = 0, . . . , s — 1. This representation, the last 
equality, the definitions fl"8j) , (jSJ) and (jlj) give (jHJ)- 
Proof of the theorem 2. Obviously, 

Hp( a )(l7 9 r(a)| + logp(a)) = 

aeA 

J2p(a)(h 9 r(a)\+logp(a)) + 5>(a)(log(p(a)/p(a)). (19) 
and, from (JU),© we obtain 

X)p(°)(l7flr(a)| +logp(a)) =53(|7*r(a)l +logp(a)) ^ p(a) = 

aeA i=l aeA; 

s 

£(l7sr(a)l +logp(o)) 51 = H P(°)(l7 3 r(a)| + logp(a)). 

i=l a£Ai aeA 

This equality and (|19|) gives 

2 P( a )d7 9 r(a)| + logp(a)) = 

aeA 

Hp( a )(l7 9 r(a)| +logp(a)) + ^p(a)(log(p(a)/p(a)). 

aeA aeA 
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From this equality, the statement of the theorem and the definitions (J3J) and 
(JEJ) we obtain 

J2p(a)(\l gr (a)\+\ogp(a)) <A + 5. 

Theorem 2 is proved. 

The proof of the theorem 3. The proof is based on the theorem 1. 
From (j5J) we obtain the following obvious inequality 

R{m) < max max I log(mj/Z)/nj. (20) 

i=l,...,s l=l,...,rm 

Direct calculation shows that 

<9(log(mj/£)AO/<^ = log 2 e (ln(rn,j/Z) - l)/n i} 

«9 2 (logK/Z)M)/^ 2 = -]og 2 e/{lm) < 

and, consequently, the maximum of the function \og{m,i / 1) / rii is equal to 
mjloge/(enj), when I = rrii/e. So, 

max I log(rrii/ 1)/ Hi < rrii log e/(erii) 

l=l,...,rrii 

and from ()20|) we obtain 

R(m) < max mj loge/(enj). (21) 

i '\.,...,s 

That is why, if 

rrii < 5 erii/ 'loge (22) 

then R(rh) < 5. By definition ( see the statement of the theorem ) , rij = 
rii_i + mj and we obtain from (|22j) the first claim of the theorem. Taking 
into account that n s _i < N < n s and (|21j). (J22J) we can see that, if 

N = ci(l + <5e/loge) s + c 2 , 

then R(m) < 5, where ci and c 2 are constants and — ► oo. Taking the 
logarithm and applying the well known estimation ln(l +e) ~ £ when £ « 0, 
we obtain (jHJ). The theorem is proved. 
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